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ABSTRACT 

A  simple,  yet  complete  and  detailed  description  of  the  Fast  Fourier 
Transform  for  general  N  is  given,  with  the  aim  of  making  the  underlying 
idea  quite  apparent.  To  help  with  this  didactic  goal,  a  simple  twist,  i.e., 
a  shifting  of  information  from  rows  to  columns  during  the  calculations,  is 
introduced  which  allows  to  give  a  simple  meaning  to  intermediate  results 
and  assures  that  the  final  results  need  no  further  reordering. 
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Significance  and  Explanation 


In  the  last  twenty  years,  various  forms  of  a  fast  Fourier 

transform  (FFT)  have  become  popular.  The  various  algorithms  have 

in  common  that  they  produce  the  discrete  Fourier  transform  on  N 

2 

points  in  about  N  ln(N)  rather  than  N  operations.  But,  while 
these  ideas,  notably  through  Cooley  &  Tukey,  have  found  wide  appli¬ 
cation  in  computations,  their  didactic  treatment  has  left  something 
to  be  desired. 

This  report  is  an  attempt  to  remedy  this  situation. 


The  responsibility  for  the  wording  and  views  expressed  in  this  flesci 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


FFT  AS  NESTED  MULTIPLICATION,  WITH  A  TWIST 


Carl  de  Boor 


1.  Introduction.  The  discrete  Fourier  transform  (DFT)  z  =  F  z  of  an 
111  =  N= 


N-vector  z  is  given  by  the  rule 

...  r  ,  (v-l)(n-l) 

(1)  2v  =  l  Zn“N 

n*l 


1 1  •  •  •  t 


with 

(2)  Ujj  =  exp(-/^l  2ir/N) 

a  principal  N-th  root  of  unity.  Thus,  z^  is  given  as  the  value  of  a  polynomial 
of  degree  <  N  at  the  point  1  and  can  therefore  be  calculated,  by  nested 

multiplication,  in  N  operations.  Here,  I  follow  Cooley  &  Tukey  11]  in  counting 
a  complex  multiplication  followed  by  a  complex  addition  as  one  operation. 

In  the  last  twenty  years,  various  forms  of  a  fast  Fourier  transform  (FFT) 
have  become  popular.  The  various  algorithms  have  in  common  that  they  produce  the 
DFT  on  N  points  in  about  N  ln(N)  rather  than  H2  operations.  See  Winograd 
[10]  for  the  latest  developments.  But,  while  these  ideas,  notably  through  Cooley 
S  Tukey  [1],  have  found  wide  application  in  computations,  their  didactic  treatment 
has  left  something  to  be  desired. 

In  a  recent  article  [7],  H.  R.  Schwarz  attempts,  as  he  says,  to  remove  the 
mystical  aspect  which  the  FFT  has  for  many  people.  He  does  this  by  describing  the 
FFT  in  terms  of  a  factorization  of  the  transformation  matrix,  an  idea  which  he 
ascribes  to  Theilheimer  [9]  but  which  occurs  already  in  Good  [5]  where  a  FFT 
different  from  that  of  Cooley  &  Tukey  is  given.  A  factorization  of  the  transforma¬ 
tion  matrix  is  also  the  basic  idea  on  which  Glassman  (4)  builds  his  FFT,  and  Drubin 
[2]  has  refined  this  further;  see  Ferguson  [3]  for  a  lucid  description  and  a  simple 
Fortran  program. 

By  contrast,  I  want  to  give  here  what  I  believe  to  be  a  simple  description  of 
the  FFT  for  a  general  N  in  terms  of  nested  multiplication.  Certainly,  Cooley  c 
Tukey  II]  thought  of  the  FFT  in  these  terms. 
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2.  The  case  of  two  factors.  Suppose  that  N  =  PQ  for  two  integers  P 
and  Q  greater  than  1  .  Think  of  the  N-vector  z  as  stored  Fortran  fashion  in 
a  one-dimensional  array.  Then  we  can  interpret  that  array  also  Fortran  fashion 
as  a  two-dimensional  array  Z  ,  of  dimension  (P,Q) .  This  means  that 

<3)  Z(p'q)  =  Zp+P (q-1)  '  P  “  1 . P'  q  =  1 . Q  • 

Correspondingly,  factor  the  sum  (1)  for  z^  into  a  double  sum. 


?  9  ,  (v-1)  [p-1  +  P(q-l)] 

l  l  Z(p,q)  <d 

p-i  q-i 

pi  f 1 


Here,  we  have  made  use  of  the  fact  that 

%  =  “Q  • 

This  makes  apparent  the  crucial  fact  that  the  inner  sum  in  the  last  right  hand 
side  is  Q-periodic  in  v  ,  i.e.,  replacing  v  by  v+Q  does  not  change  its  value, 
due  to  the  fact  that  u®  =  1  .  This  means  that  we  need  only  calculate  this  sum 
for  v  =  1,...,Q  (and  for  each  p) .  Thus,  for  each  p  -  1,...,P,  we  calculate 
from  the  Q-vector  Z (p , • )  the  Q-vector  whose  entries  are  the  numbers 


9  .  (v-lHq-l) 

I  Z  (p,q)  id  . 

q=l  * 


v  =  1, ...  ,<2  , 


i.e.,  we  calculate  the  DFT  FgZ(p,0,  p  =  1,...,P,  at  a  total  cost  of  P*Q  = 

N*Q  operations. 

Now,  we  could  store  the  transform  of  Z(p,*)  over  Z(p,*).  But  in  anti¬ 
cipation  of  further  developments,  we  choose  to  store  the  transform  F^Z (p, •)  in 
Z^ ( • ,p) ,  where  Z^  is  a  two  dimensional  array  of  size  (Q,P)  ,  rather  than  (P,Q) . 
With  this,  the  calculation  of  zy  is  reduced  to  the  evaluation  of  the  sum 

(5)  z  *  l  Z.(v  ,p)  «4V  1)  <P  1)  *  v  »  1,...,N  . 

v  p»l  « 

Here,  we  have  used  the  notation  to  indicate  the  integer  between  1  and  0 

for  which  v  -  Vg  is  divisible  by  Q  .  At  this  point,  it  becomes  convenient  to 
think  of  the  one-dimensional  array  which  is  to  contain  the  N-vector  z  equivalently 
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as  a  two-dimensional  array  2  ,  of  size  (Q,P) .  This  means  that 


(6) 


\  +  Q(w-l)  =  Z0(V'U)  '  for  v  =  1 . Q-  11  *  1 . P 


With  this,  (5)  can  be  written  equivalently  as 

z0(v,u)  =  l  Z  <v,p)  Jv-l  +  Q(v-l)](p-l) 

P=1 

v  *  1  f  •  •  •  fQf  y  =  1 1  •  •  •  f  P  • 
Here,  the  right  hand  side  is  a  polynomial  of  degree  <  P  in  the  quantity 
1  +  0(u  1).  This  quantity  can  be  generated  as  one  goes  along,  as  in  the 
following  convenient  arrangement  of  the  calculations: 
x  1 

for  u  »  1, . . . ,P  ,  do: 


(7) 


for  v  =  1 , . . .  ,Q  ,  do: 

P 

ZQ(v,u)  :*  \  Z^ ( V,p) 

p-1 


-1 


X  :■  X 


*  “N 


The  sum  in  the  innermost  loop  is,  of  course,  to  be  evaluated  by  nested  multi- 

2 

plication.  The  total  cost  of  this  step  is  then  Q*P  =  N*P  operations  (if  we 

neglect  the  N  multiplications  needed  to  generate  the  various  x’s).  In  this 

way,  we  have  obtained  in  Z^  the  discrete  Fourier  transform  z  of  z  at  a 

2 

cost  of  only  N(P+Q)  rather  than  N  operations. 


3 


S^^jhgjenejral^cgse.  "It  is  easy  to  see  how  successive  applications  of 
the  above  procedure,  starting  with  its  application  to  (4) ,  give  an  m-step  algo¬ 
rithm  requiring 

T  »  N(P,  +  P_  +  . . .  +  P  ) 

12  n 

operations,  where 

(8)  N  «  Pj-Pj-..- *V" 

So  say  Cooley  &  Tukey  [1]  (except  for  a  change  in  symbols  and  equation  numbers) . 

In  effect,  they  point  out  that  the  first  step  of  the  calculations  above  consists 

in  forming  the  DFT  of  various  Q-vectors.  Hence  if  Q  itself  is  the  product  of 

two  integers  greater  than  1  ,  this  calculation  can  be  carried  out  in  fewer  than 
2 

Q  operations  by  applying  the  same  procedure  to  it,  etc.  The  actual  implementa¬ 
tion  of  this  idea  may  not  be  immediately  obvious,  though.  For  this  reason,  I  now 
discuss  a  slightly  different  (and  novel)  view,  according  to  which  the  entire  trans¬ 
form  can  be  effected  by  m  applications  of  a  slightly  enlarged  version  of  (7) . 

The  basic  idea  is  to  interpret  the  storage  arrays  for  the  various  N-vectors 
involved  in  various  ways  as  multidimensional  arrays  and  to  shift  information 
appropriately  from  "rows  to  columns"  as  we  did  earlier  when  storing  the  DFT  of  the 
row  Z(p,*)  of  Z  in  the  column  Z^(*,p)  of  Z^  .  For  this,  I  need  same  nota¬ 
tion  to  indicate  that  a  given  one-dimensional  array  is  being  considered  equiva¬ 
lently  as  a  two-  or  three-dimensional  array. 

A 

If  Z  is  a  one -dimensional  array  of  length  N  ,  then  Z  denotes  the  equiva- 

A  B 

lent  two-dimensional  array  of  dimension  (A,N/A) ,  and  Z  '  denotes  the  equivalent 

three-dimensional  array  of  dimension  (A,B,N/ (AB) ) .  Thus 

1^1  ^  (a,b,c)  ■  z  (a ,b+B (c-1) )  ■  Z  (a+A(b-l) ,c) 

-  Z(a+A(b-1+B(c-1) ) )  . 

Let  now  Z  be  a  one-dimensional  array  containing  z  ,  as  before,  and,  for 
k  •  0,...,m  ,  let  ^  be  «  one-dimensional  array  satisfying 
(10)  Z*(.,c)  ■  F  !lW(c,0,  c  *  1,...,BP, 

with 
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(U)  B  ==  Bk  :=  Pl.....Pk_r  P:=V  A  ,=  A*  :-  Pm  *  - . .  •PfcfI 

Then  Z  =  Z  ,  and  z„  contains  z  =  F„z  .  Further,  with  A,  P,  B  as  given  by 
m  0  =  N= 

(II),  one  obtains  z]c_1  from  Zk  by  the  following  slightly  extended  version  of  (7): 


for  p  =  1, . . .  ,P  ,  do: 

for  a  =  1, . .. ,A  ,  do: 


p 

„A,P,  .  .  r  _A,B,  .  Ti-l 

_ Zk'1(a,p,b)  :=  I  Zk'  (a,b,7r)  •  x 


Indeed,  the  algorithm  produces 


A,P ,  .  .  r  „A,B,  .  ,  (a-1  +  A(p-l) )  ( it— 1) 

Zk-l(a,p'b>  ”  £  \  (a'b'1,)  “ap 


On  the  other  hand,  (10)  implies  that 


Z*,B(*,b,7r)  =  F  ZB,P(b,7t,  •)  *  \  ZB,P(b,ir,a)  u. 


(-1)  (a-1) 


Therefore, 

_A,B .  r  r  _B,P,.  ,  P(a-l)(a-l)  +  ta-l+A(p-l)  ]  (ti-1) 

3k-l<a'p,b)  “  i  l  z  •*»<*>  “ap 

77*1  a*l 
Ap 

But  now,  since  *  1  ,  we  may  add  to  the  exponent  on  the  right  hand  side  any 

integer  multiple  of  AP  ,  and  this  allows  the  conclusion  that 

Z*'P  (a,p,b)  =  l  l  ZB,P(b,ir,a)  “  la~1+A(P_1)  1  Iti-1+P (a-1)  ] 

JC“i  -  AP 

7T*1  a**l 

and  so  proves  that  zk_1  ,  as  produced  by  (12),  satisfies  (10)  (with  k  replaced 
by  k-1)  . 

This  shows  that  the  DFT  z  is  obtainable,  in  ZQ  ,  by  m  applications  of 

algorithm  (21) ,  starting  from  Z  containing  z  .  Since  the  k-th  such  applies- 

in  = 

tion  costs  Pj^B^Pj^  “  N*Px  °P«r®tions,  the  total  number  of  required  operations  is 
indeed  given  by  (8) . 

In  a  Fortran  implementation  of  the  algorithm,  one  would,  of  course,  need  only 


two  arrays  to  play  the  role,  in  alternation,  of  the  m+1  arrays  Z  . 

W  0 


oooooooooooonoooooo 


SUBROUTINE  FFT  (  Z1,  12,  f!,  INZEE  ) 

CONSTRUCTS  THE  DISCRETE  FOURIER  TRANSFORM  O^  Z1  (OR  12)  IN  THE  COOLEY- 

r  TIlIfFY  UAY  RUT  WTTH  A  TUTRT 

INTEGER  INZEE, N,  AFTER  *  BEFORE , NEXT , NEXTMX, NOW , PR IME ( 1 2) 

COMPLEX  Z  1(N),Z2(N) 

**»«  INPUT  »»»»»» 

Z 1 ,  Z2  COMPLEX  N-VECTORS 
N  LENGTH  OF  Z1  AND  12 

INZEE  INTEGER  INDICATING  WHETHER  Z1  OR  12  IS  TO  BE  TRANSFORMED 
s  1  ,  TRANSFORM  Zt 

=  2  ,  TRANSFORM  12 

WORK  AREAS  »***»» 

Z1,  12  ARE  BOTH  USED  AS  WORKARRAYS 
>••••  OUTPUT 

Z1  OR  12  CONTAINS  THE  DESIRED  TRANSFORM  (IN  THE  CORRECT  ORDER) 
INZEE  INTEGER  INDICATING  WHETHER  Z1  OR  12  CONTAINS  THE  TRANSFORM, 

=  1  ,  TRANSFORM  IS  IN  Z1 

=  2  ,  TRANSFORM  IS  IN  12 

***•  METHOD  **•**» 

THE  INTEGER  N  IS  DIVIDED  INTO  ITS  PRIME  FACTORS  (UP  TO  A  POINT). 
FOR  EACH  SUCH  FACTOR  P  ,  THE  P-TRANSFORM  OF  APPROPRIATE  P-SUBVECTORS 
OF  Z1  (OR  12)  IS  CALCULATED  IN  F  F  T  S  T  P  AND  STORED  IN  A  SUIT¬ 
ABLE  WAY  IN  12  (OR  Z1).  SEE  TEXT  FOR  DETAILS. 

DATA  NEXTMX, PRIME  /  12,  2, 3, 5, 7 , 1 1 , 1 3, 17, 19 , 23, 29 , 31 , 37  / 

AFTER  s  1 
BEFORE  =  N 
NEXT  =  1 
C 

10  IF  ( (BEFORE/ PR IME (NEXT )) *PR IME (NEXT )  .LT.  BEFORE)  THEN 
NEXT  =  NEXT  ♦  1 
IF  (NEXT  .LE.  NEXTMX)  THEN 

GO  TO  10 

ELSE 

NOW  r  BEFORE 
BEFORE  =  1 
END  IF 
ELSE 

NOW  =  PRIME(NEXT) 

BEFORE  =  BEFORE/PR IME (NEXT ) 

END  IF 
C 

IF  (INZEE  .EQ.  1)  THEN 

CALL  FFTSTP(  Z1,  AFTER,  NOW,  BEFORE,  Z2  ) 

ELSE 

CALL  FFTSTP (  Z2,  AFTER,  NOW,  BEFORE,  Z1  ) 

END  IF 

INZEE  =  3  -  INZEE 

IF  (BEFORE  .EQ.  1)  RETURN 

AFTER  r  AFTER*NOW 


GO  TO  10 


SUBROUTINE  FFTSTP  (  ZIN,  AFTER,  NOW,  BEFORE,  ZOUT  ) 

CALLED  IN  F  F  T  . 

CARRIES  OUT  ONE  STEP  OF  THE  DISCRETE  FAST  COURIER  TRANSFORM, 

INTEGER  AFTER, BEFORE, NOW,  IA,I3,IN,J 
REAL  ANGLE, RATIO, TWOPI 

COMPLEX  ZINCAFTER, BEFORE, NOW) , ZOUT ( AFTER, NOW , BEFORE) ,  ARC, OMEGA, 
*  VALUE 

DATA  TWOPI  /  6.2831  85307  17958  64769  / 

ANGLE  =  TWOPI /FLOA  KNOW*  AFTER  ) 

OMEGA  =  CMPLX (COS (ANGLE) ,-SIN(ANGLE) ) 

ARG  =  CMPLX ( 1 . , 0, ) 

DO  100  J=1 ,NOW 

DO  90  IA  =  1 .AFTER 

DO  80  IB= 1 , BEFORE 

VALUE  =  ZIN(IA, IB, NOW) 

DO  70  INrNOW-1 , 1,-1 

70  VALUE  =  VALUE*ARG  +  Z IN (I  A , IB , IN ) 

30  ZOUT(IA, J, IB)  =  VALUE 

90  ARG  =  A  RG*OMEGA 

100  CONTINUE 

RETURN 

END 


There  is  no  claim  that  the  above  program  is  competitive  with  the  carefully 
constructed  codes  such  as  that  of  Singleton  [8] .  Its  virtue  lies  chiefly  in  its 
simplicity  and  transparency.  On  the  other  hand,  Eric  Grosse  [6]  found  that  the 
above  code,  modified  to  give  special  treatment  in  FFTSTP  for  the  case  NOW  =  2, 
and  to  avoid  subroutine  calls  for  the  complex  arithmetic  operations,  and  compiled 
by  an  optimizing  compiler,  needed  only  1.5  to  2  times  as  much  computing  time  as 
did  Singleton's  program  for  a  variety  of  choices  of  N  . 

Finally,  the  above  discussion  is  based  on  the  Fortran  convention  whereby 
multidimensional  arrays  are  stored  "column  by  column",  i.e.,  with  the  first  index 
running  fastest.  It  is  easy  to  base  the  discussion  instead  on  the  Algol  convention 
whereby  arrays  are  stored  "row  by  row",  i.e.,  with  the  last  index  running  fastest. 

JgJSBSKiSSlSSBSSi-  1  grateful  to  Warren  Ferguson  for  several  discussions 
concerning  fast  Fourier  transforms  and  for  comments  on  an  earlier  draft.  I  am  in¬ 
debted  to  Eric  Grosse  for  carrying  out  the  comparisons  mentioned  above  and  for  sug¬ 
gesting  that  the  more  leisurely  discussion  in  an  earlier  draft  be  replaced  by 
showing  directly  that  the  as  generated  by  (12)  satisfy  (10) . 
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